The following is a brief linguistic analysis of the use of racially charged language in William Faulkner’s Absalom, Absalom!. Faulkner’s representation of race was complicated, just as his own relationship with race was complex. As a Southern white moderate, he voiced his anguish over the dehumanization of African Americans under Jim Crow segregation, and, at the same, time could also casually refer to people as “niggers” during the public retelling of a comic story. Indeed, there is no shortage of literature on Faulkner and race in general, and regarding Absalom, Absalom! in particular. Given this extensive critical history, it almost goes without saying that a computational analysis of word choice, especially with regard to racially charged language, cannot do justice to the complexities and nuances of either the text or Faulkner’s broader critical intervention. Nevertheless, using techniques common in Corpus Linguistics (CL) it is possible to give a bird’s-eye view of how the use of certain words is patterned. This pattern, can then in turn, inform subsequent close readings.
The analysis reveals that Faulkner’s use of racial language is not incidental but carefully patterned. This is notable with perhaps the most charged word in American English: “nigger.” Faulkner makes extensive use of this word throughout this text. As the data reveal, the n-word occurs most frequently in relation to Thomas Sutpen, and in particular his origin story in the 1830s in Tidewater, Virginia. It is the moment he becomes simultaneously race and class conscious when he is rebuffed at the front entrance of a plantation by an enslaved butler. The scene, and Sutpen’s origin writ large, will replay itself through the voices of the texts narrators. Although distinct, it would be a mistake to see these voices as singular. The linguistic analysis reveals overlapping patterns between narrators that speak of a communal story of Sutpen told through differences in tone, register, and inflection, but always with the same story material. The narrator’s do not own their voice any more than they own another person’s story. They are the bard’s of its transmission.
Needless to say, the n-word occurs frequently in this analysis. It is expressed only in its full form when directly quoting Faulkner. Otherwise, its full version has been deemed gratuitous and unnecessary for data analysis.
The following piece uses several techniques available to standard CL analysis, alongside a more complex analysis exclusively available to practitioners who have access to the Digital Yoknapatawpha data set. These different techniques have been split into their own sections.
With any textual analysis, some pre-processing is required. The steps that follow are standard procedures in CL. The text of Absalom, Absalom! was read in as a txt file. It was then broken into nine chapters, and further sub-setted into sentences. The individual words were subsequently “tokenized.” The process of tokenization removes capital letters, special characters, and punctuation. It enables the computer to compare words more easily. Each “stop word” was then removed. These are words like: the, a, on, at, etc. that are very frequent in any text, and do not add to the analysis. The words were then lemmatized. Lemmatization reduces a word to the word stem. For example, “Negroes” becomes “Negro.” This way all instances of the concept “Negro” are unified as one instance. This prevents creating separate counts for words like Negro, Negroes, and Negro’s.
The resulting slate of words was tagged as racially charged by adding a column called race_word and indicating TRUE or FALSE for each word. This was done by creating a list of racial words and joining it to the data table through a left sided join. Essentially, it checks to see if a word like “Negro,” “White,” or “Octoroon” occurs and tags it as TRUE. Such a list of racial words is necessarily imperfect as the words “black” and “white” could also denote colors and not racial designations. Still, with this pre-processing complete it is possible to provide some key statistical insights.
The chart below shows the ten most frequent non-racial words and racial words in the text. Hovering over the individual bars reveals their precise number, and clicking on TRUE and FALSE turns that particular series on and off.
What is immediately noticeable is that the n-word is the most frequent racial term. It exceeds the word “negro” by 46 counts. It occurs about a third as infrequently as the word Henry (one of the main character’s) and twice as infrequently as the racially ambiguous Charles Bon. Importantly, the occurrences of the individual names of characters is not the same as the number of times they actually occur in the text. After all, the pronouns “he” or “she” could equally well denote a character, but that is not shown here.
Collocation is a process of determining what words appear together. This is done by creating n-grams, where “n” is the number of words that might match in a sequence. By determining the n-gram around particular words, we can get a better sense of the context. For example, in her research of British Newspapers, Dawn Archer has shown that the most common bigram (n-gram of two) for Muslim is “Muslim terrorist” (Archer 2016). Certainly, this strong association between these two words indicates how Muslim’s are represented in the British media. In similar fashion, we get a better sense of how Faulkner is using racial language by looking at the words immediately before and after them.
The phrase that stands out the most is one that Rosa Coldfield uses early on “wild niggers” (4). It becomes a leitmotif for much of the text and the phrase will be repeated throughout. Yet, who repeats it and how it is repeated will change.
In their use of either “wild niggers” or “wild negro,” Quentin and Rosa Coldfield share an inverse relationship. This is curious because it is Rosa who first uses the phrase when referring to the demonic Sutpen arriving in Yoknapatawpha:
Out of quiet thunderclap he would abrupt (man-horse-demon) upon a scene peaceful and decorous as a schoolprize water color, faint sulphur-reek still in hair clothes and beard, with grouped behind him his band of wild niggers like beasts half tamed to walk upright like men, in attitudes wild and reposed, and manacled among them the French architect with his air grim, haggard, and taller-ran.
It is this initial instance of the phrase uttered by Rosa that is carried forward throughout the text. Interestingly, Quentin takes this note and appears to repeat it throughout the text. What’s more, Rosa’s initial association between enslavement and wildness is another echo that reverberates. This, despite the fact, that she says it only once. It is the first hearing of the words that provide a mold for all other versions.
We can also look at the word frequency data sequentially by casting it across the chapters. This indicates the frequency of a particular word in each chapter. It may be that some racial words are used in one part of the book and not in others. This gives some indication as to its value in the narrative.
Chapter 7 is particularly racially charged. While the text’s various narrators (Rosa, Mr. Compson, Quentin, Shreve) predominate in certain chapters, it would be a mistake to attribute particular words to particular characters based on this raw data. We may recall that chapter 7 is a nested narrative in which we are told the story of Thomas Sutpen as he related it to General Compson, who told it to Mr. Compson, who told it to Quentin, who is telling it to Shreve. There are so many narrative frames that it is very difficult to determine whose language this is. What is apparent is that the chapter in which most of Sutpen’s life is revealed is steeped in pejorative racist language. To be sure, in all the other chapters the word “negro” or “black” is used more frequently to describe African Americans.
Sentiment analysis is a field of CL that tries to establish the emotional valence of a segment of text. It does so through sentiment libraries. These are words that have been hand coded to indicate broadly defined emotions like: joy, sadness, surprise, or, more generically, positive and negative. In general, sentiment libraries are used for analyzing social media or large data sets where the narrative data tends to be less complex and operates at scale. Thus, while the sentiment dictionary might not match each sentiment exactly, in the aggregate the predominant emotion rises to the top.
For literary works, sentiment analysis is far more speculative and merits quite some caution. Without a specially trained dictionary for a specific corpus, sentiment analysis can reveal certain patterns around words, but it is unclear what the margin of error might be. There are, so to speak, unknown unknowns. This is particularly true of Faulkner who uses many words that are emotionally charged that might not make their way into a sentiment library, or who uses words like “unamaze” to negate a particular emotion, in this case, surprise. Any results that sentiment analysis generates should therefore be seen as a prompt into further inquiry and not a final result.
One of the most basic ways to think through sentiment are the positive and negative sentiments across a text. The basic procedure is to tag each positive and negative sentiment in a text and then tabulate these chunks by some logical unit, be it a sentence, paragraph, or chapter. This will give you the total sentiment of that particular unit. Since, we are interested in the emotion surrounding racial words, it makes the most sense to set the unit boundary at the sentence level. This produces a very granular chart, but for Absalom, Absalom! this granularity is very revealing.
The immediate things that stands out about this chart is just how negatively charged sentences in Absalom, Absalom! are. There are very few positive sentences in this text. The sentences that contain racial words are predominately negative. On average, they are significantly more negative (-3.32) than non-racial words (-1.27). In fact, the sentence with the most negative emotions attached to it is also racially charged. This is sentence 1421, which, at 969 words, is also one of the longest sentences in the text. If you do not know Absalom, Absalom! by sentence, and I hope you don’t, this is the passage that speaks of Sutpen’s dissolution in the wake of the Civil War during his drunken parleys with Wash Jones. The reason for the overabundance of negative emotions is both the sentence length and its grotesque content.
Understanding when a certain word is used and in what emotional context does not necessarily indicate who is using it. There is currently no way to determine who is speaking in Absalom, Absalom! This is a two-fold issue. Practically, there is no way to match the speaker with the racial terms, because the data is not available at that level of granularity. More philosophically, we may also wonder if anyone’s language is truly their own in the text. This is a community that has been shaped by the same story for generations. The cadence, register, and tone all inform particular leitmotifs that occur and reoccur throughout the narrative. Indeed, one of the interesting phenomena that CL reveals is just how often certain turns of phrase are repeated, re-worked, and re-contextualized. The singularity of the speaker is unsettled by the multiplicity of the spoken.
That being said, it is possible to investigate the proximity of racial words relative to characters. The Digital Yoknapatawpha database breaks down a text into events, which are, in turn, composed of locations and characters. By cross-referencing the words with the events, we can get some notion of what words are being used around what characters.2
While we could count all of the words associated with a character, this is not a relevant statistic. The words that make the most sense are the five most frequent race words discussed in Figure 1. These are: blood, black, negro, n-word, and white. As each character necessarily occurs with each word at a different frequency, the top five characters were selected by the number of appearances in the total number of the events.
The resulting chart is quite revealing. Among most of the characters, the ratio of the word negro vs. the n-word is relatively even. The most obvious difference is Thomas Sutpen. In events where is he present or mentioned the n-word occurs 132 times. Part of the reason for this is that Thomas Sutpen occurs in the most events throughout the text, 320 to be exact. Consequently, it makes sense that he has a higher chance of occurring in those events in which there is a particular racial word.
In order to get a better view into how often a racial word is used in the same event as a character, we need to normalize the data by the number of times the word occurs. This way we can understand, proportionally, how often a character is in an event when a particular word occurs. We know from the previous chart that the n-word occurs 152 times. Dividing the number of occurrences for each character by this number produces the percentage chart below.
The chart reveals, quite dramatically, that Thomas Sutpen is, in some, way part of the event 87% of the times that the n-word occurs. While it is not clear that he is using the word, it is also clear that he is the character with whom the racial epithet is most associated. Indeed, in one event it occurs 16 times. This is when Sutpen is barred from entering the front door of the plantation by the enslaved butler (187). It is this primal incident that shapes so much of Sutpen’s consciousness going forward. Linguistically, it becomes the gravitational center that draws in the worst racial language American literature has to offer. At the moment that he becomes aware of his class difference, he resorts to racial antagonism.
The use of racial language is also time bound. Certain words are more prominent during certain periods than others. The DY data also includes speculative dates for each event. These dates consist of both an earliest possible start date and latest possible end date. Needless to say, establishing Faulkner’s chronology is not an exact science and the dates are best seen as an approximate measure. Nevertheless, they do give a general indication around what time the events take place.
What is immediately visible is that the n-word is the most used word in the 1830s and 1860s. These are two great chapters in the Sutpen saga: the establishment of a racial enslavement regime and its dissolution during and after the Civil War. During Reconstruction, the words Black and Negro are used more frequently, albeit only a very few number of times. Interestingly, the lines bifurcate from 1900 to 1910. This is because the year 1909 is grouped with 1900 and not 1910. Likely what is visible is the difference in the usage of the n-word by Rosa Coldfield and Shreve and Quentin.
Because the DY database has mapped the approximate location of the different events in Yoknapatawpha it is also possible to view this information spatially, as it plays out across the narrative. Any number of data points can be plotted, but as this leads to map crowding, only those words directly related to African Americans were used. The different layers can be enabled and disabled by clicking on the legend.
The map confirms for a third time that much of the direct racial language is related to Sutpen’s origin story in Tidewater, Virginia. This is not a dramatic reveal. Scholars have known that this was one of the key incidents of the text. What the linguistic analysis reveals is just how much Faulkner concentrated all of his racial epithets in this one relatively small slice of text to indicate how deeply this one incident shaped Sutpen. In the grand scheme of things, this slight rebuff angers Sutpen so much that it initiates a cascade of choices that perpetuate, but also complicate America’s racial hierarchy.
The linguistic analysis of Faulkner’s use of racial language in Absalom, Absalom! is quite striking. It demonstrates that he does not use the n-word casually and indiscriminately. Instead, it serves a very specific purpose as a dramatic and thematic marker. It frames the speaker, the spoken to, and narrator who is re-speaking a communal narrative. By localizing the usage of the n-word predominately in Sutpen’s origin story, superficially the language is “contained” within a particular person situated in a specific time and place. It is a story that is locked away “over there” and “in the past.” Of course, in Faulkner the past is never locked away. The use of the n-word spreads throughout the text and infects everyone who re-tells the story. Even if the speakers don’t repeat the word, they are implicated in the violence that brought it to life as one of the most problematic words in the English language. Moreover, the use of the n-word in Sutpen’s origin story serves as a strategic misdirection: his highly charged racial language allows the various speakers to ignore the racial violence that preceded their own social standing. More bluntly, by casting Sutpen as a virulent racist, the narrator’s absolve themselves of their own racism.
All of the data was generated in the R programming language using the tidyverse suite of packages for the calculations and the plotly library for the graphics. The full repository is available at https://github.com/joostburgers/absalom_sentiment_analysis Due to copyright issues the repository does not include the Absalom, Absalom! text file used for data analysis. The text file used for text analysis was the 2011 Vintage Edition (Faulkner 2011), which corresponds with the edition we used for DY↩︎
This re-composition process is quite technical and the full process is documented here: Absalom, Absalom! Text Processing Supplement.↩︎